Resolution of Coordination Ellipses in Biological Named Entities Using Conditional Random Fields
نویسندگان
چکیده
Coordination ellipsis is a linguistic phenomenon where relevant information is eliminated from the linguistic surface expression of a coordination. Thereby, the assumption is that readers can easily reconstruct the eliminated material given appropriate background knowledge. This phenomenon is particularly abundant in science jargon. As far as named entity recognition (NER) is concerned, failing resolution of coordination ellipsis leads to serious degradation in the performance of NER systems. Usually, they recognize elliptical coordinations, wrongly, as complex single entity mentions or classify correctly only non-elliptical parts of coordinations. As an alternative, we propose a methodology for decomposing complex coordinated entity expressions into constituent conjuncts, for determining the missing elements and thus reconstructing explicitly all of the single entity mentions. We present here a novel supervised machine learning-based approach to the resolution of elliptical coordinations in noun phrases. For the task of conjunct identification the model achieves performance of 93% on the biomedical GENIA corpus.
منابع مشابه
A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملA Neural Named Entity Recognition Approach to Biological Entity Identification
We approach the BioCreative VI Track 1 task of biological entity identification by focusing on named entity recognition (NER) and linking tagged entities to standard database identifiers. For this task, we apply recent neural NER techniques of combining bi-directional long short term memory (BLSTM) network layers with conditional random fields (CRFs) to the biomedical domain. We then use contex...
متن کاملDomain Focused Named Entity Recognizer for Tamil Using Conditional Random Fields
In this paper, we present a domain focused Tamil Named Entity Recognizer for tourism domain. This method takes care of morphological inflections of named entities (NE). It handles nested tagging of named entities with a hierarchical tagset containing 106 tags. The tagset is designed with focus to tourism domain. We have experimented building Conditional Random Field (CRF) models by training the...
متن کاملICSI-CRF: The Generation of References to the Main Subject and Named Entities Using Conditional Random Fields
In this paper, we describe our contribution to the Generation Challenge 2009 for the tasks of generating Referring Expressions to the Main Subject References (MSR) and Named Entities Generation (NEG). To generate the referring expressions, we employ the Conditional Random Fields (CRF) learning technique due to the fact that the selection of an expression depends on the selection of the previous...
متن کاملConditional Random Fields and Support Vector Machines for Disorder Named Entity Recognition in Clinical Texts
We present a comparative study between two machine learning methods, Conditional Random Fields and Support Vector Machines for clinical named entity recognition. We explore their applicability to clinical domain. Evaluation against a set of gold standard named entities shows that CRFs outperform SVMs. The best F-score with CRFs is 0.86 and for the SVMs is 0.64 as compared to a baseline of 0.60.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007